feat: auto-detect proxy and translate peer addresses to contact point#153
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #153 +/- ##
==========================================
+ Coverage 55.64% 55.74% +0.09%
==========================================
Files 21 22 +1
Lines 4868 4879 +11
==========================================
+ Hits 2709 2720 +11
Misses 2159 2159
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CI SummaryStatus: ✅ All jobs passed ✅ Rustfmt — passedNo issues.✅ Clippy — passedNo issues.✅ Tests — passedNo issues.✅ Build — passedNo issues.🤖 Generated by CI Summary • Full logs |
44508e7 to
b2b55d7
Compare
|
@dkropachev @Lorak-mmk is the right way to use the rust driver ? equivalent to scylladb/python-driver#833 |
b2b55d7 to
2e38ceb
Compare
| /// An [`AddressTranslator`] that redirects all peer connections to the original | ||
| /// contact point address. Used when the cluster is accessed through a proxy. | ||
| /// | ||
| /// All discovered node addresses are translated to `proxy_address`, ensuring | ||
| /// the driver only connects through the proxy endpoint. | ||
| #[derive(Debug, Clone)] | ||
| pub struct ProxyAddressTranslator { | ||
| /// The proxy/contact point address to route all connections through. | ||
| proxy_address: SocketAddr, | ||
| } | ||
|
|
||
| impl ProxyAddressTranslator { | ||
| /// Create a new translator that routes all connections to `proxy_address`. | ||
| pub fn new(proxy_address: SocketAddr) -> Self { | ||
| Self { proxy_address } | ||
| } | ||
|
|
||
| /// Returns the proxy address this translator routes to. | ||
| pub fn proxy_address(&self) -> SocketAddr { | ||
| self.proxy_address | ||
| } | ||
| } | ||
|
|
||
| #[async_trait] | ||
| impl AddressTranslator for ProxyAddressTranslator { | ||
| async fn translate_address( | ||
| &self, | ||
| _untranslated_peer: &UntranslatedPeer, | ||
| ) -> Result<SocketAddr, TranslationError> { | ||
| Ok(self.proxy_address) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
I'm not sure if this achieves what you want. I assume the proxy address leads to one specific node.
Let's say you have 5 nodes in the cluster, each with 32 shards. With your changes, the driver will still see all 5 nodes in system.peers/local, try to open connection pools to all 5 nodes. The address translation will cause all connections to be opened to the same node (driver won't even know about it), so you'll get 160 connections to this node.
You could use PoolSize::PerNode(1) (which is a good idea in cqlsh regardless of all other changes) to get this down to 5.
Then the connection amount problem is not that bad, but you are still in a very weird state where driver thinks it opened pools to all nodes, but they are really all to one node. Will this work correctly? It may, I'm not completely sure.
TBH I don't know how to solve that will existing APIs. There isn't really a way to implement a HostFilter that would filter out other nodes, because HostFilter accepts Peer, which has an address fetched from system.peers or system.local - so you can't really say for sure if its the same peer as the contact point.
What would be nice here is a simplified session, with a separate builder, where driver only opens a single connection to the given address, and uses it both as CC and to execute user requests. This would also work for the maintenance socket. cc @wprzytula - let's discuss this when we meet, there are some not obvious decisions when implementing such session.
There was a problem hiding this comment.
Agreed on the connection count concern. Added PoolSize::PerHost(1) — cqlsh is a single-user interactive tool so one connection per node is plenty. This brings it down to N connections (one per discovered node) all going through the proxy.
Re: the simplified single-connection session — that would be ideal. For now this is a pragmatic workaround. Happy to migrate when that API exists.
There was a problem hiding this comment.
Makes sense. This is a weird state for the driver to be in, but I think it should work. @wprzytula will be available tomorrow if you want him to also take a look (maybe I am missing some potential problem).
There was a problem hiding this comment.
I agree with your thoughs above @Lorak-mmk. Aside from that, maybe let's also use SingleTargetLoadBalancingPolicy to simplify the load balancing part?
There was a problem hiding this comment.
This will be difficult for the same reason host filtering is difficult. You would need to know which node you have CC opened to, and it can no longer be done by socket address.
There was a problem hiding this comment.
BTW, the code in this PR was proven to be working as expected in the case the broadcast address isn't available (AWS public / private addresses)
There was a problem hiding this comment.
Anyhow I'm going to merge this one, so it won't regress the expected beahvier from the python cqlsh (before the driver change that broke it)
| Ok(ScyllaDriver { | ||
| session, | ||
| prepared_cache: Mutex::new(HashMap::new()), | ||
| consistency: Mutex::new(Consistency::One), |
There was a problem hiding this comment.
Have you considered using our CachingSession instead of implementing cache yourself?
There was a problem hiding this comment.
first time I'm hearing about CachingSession, I don't think python have the equivalent, we'll check it out
There was a problem hiding this comment.
Opened #164 to track this. The current CqlDriver trait separates prepare/execute-by-id, so it's not a trivial swap — but worth doing as a follow-up refactor.
There was a problem hiding this comment.
first time I'm hearing about CachingSession, I don't think python have the equivalent, we'll check it out
scylladb/scylla-rust-driver#333 is open for nearly 5 years now...
When connecting through a proxy/load balancer (e.g., AWS NLB, PrivateLink), the driver discovers internal node IPs from system.peers that are unreachable from the client. This installs a ProxyAddressTranslator that redirects all peer connections to the original contact point address. Since known_node addresses are never translated by the scylla driver (only peer addresses from system.peers are), this is safe for both direct and proxy connections.
- Resolve DNS hostnames via tokio::net::lookup_host instead of addr.parse::<SocketAddr>() which silently fails for domain names - Add PoolSize::PerHost(1) to limit connections per node (cqlsh is single-user; also mitigates connection explosion through proxy) - Remove unused detect_proxy() function and proxy_address() getter - Trim module docs to match actual always-install strategy Refs #164
76bf605 to
b26f30b
Compare
There was a problem hiding this comment.
Benchmark
Details
| Benchmark suite | Current: b26f30b | Previous: e62a853 | Ratio |
|---|---|---|---|
cli_parse_args/no_args |
29230 ns |
29467 ns |
0.99 |
cli_validate/valid_full |
2 ns |
3 ns |
0.67 |
cqlshrc_parse/empty |
3759 ns |
3622 ns |
1.04 |
cqlshrc_parse/minimal |
10410 ns |
9615 ns |
1.08 |
cqlshrc_parse/full |
68033 ns |
66240 ns |
1.03 |
config_merge/full_merge |
875 ns |
902 ns |
0.97 |
end_to_end_startup/full |
150280 ns |
153420 ns |
0.98 |
parse_multiline/6_lines |
7394 ns |
7588 ns |
0.97 |
classify_input/empty |
13 ns |
14 ns |
0.93 |
format_table/rows/10 |
85706 ns |
86826 ns |
0.99 |
format_table/rows/100 |
767400 ns |
740640 ns |
1.04 |
format_table/rows/1000 |
7640700 ns |
7416000 ns |
1.03 |
format_expanded/rows/10 |
10212 ns |
10184 ns |
1.00 |
format_json_100 |
48732 ns |
50118 ns |
0.97 |
format_csv_100 |
38780 ns |
39539 ns |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
Summary
ProxyAddressTranslatorthat redirects all peer connections to the original contact point, enabling connections through proxies/load balancers (AWS NLB, PrivateLink, etc.)Problem
When connecting through a proxy, the driver discovers internal node IPs from
system.peersthat are unreachable from the client, causingConnection error: No connections in the pool.Solution
The scylla-rust-driver's
AddressTranslatortrait translates peer addresses discovered fromsystem.peers. Sinceknown_nodeaddresses (the contact point) are never translated, we can safely always install a translator that redirects all peer addresses to the contact point.Equivalent to scylladb/python-driver#833 (DynamicWhiteListRoundRobinPolicy) but using the rust driver's native
AddressTranslatormechanism.Testing
cargo test --lib proxy_address_translator— 5 unit tests18.208.144.200